AI ninja project [day 7] 語音轉文字

2021 iThome 鐵人賽

DAY 7

AI & Data

AI ninja project系列第 7 篇

13th鐵人賽

wilsonsujames

2021-09-07 09:47:29

1959 瀏覽

分享至

開會的時候，是有可能不留下會議記錄的，
當會議做出了錯誤的決定，造成了破口，
就很難追究責任，甚至當一切好像沒事發生一樣。

因此，這裡我們使用了GCP的Speech-to-Text功能，

啟動該API之後我們可以試著本地端使用該功能:

安裝

pip install  google-cloud-speech

本地端使用

import os

credential_path = "cred.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

def transcribe_file(speech_file):
    """Transcribe the given audio file."""
    from google.cloud import speech
    import io

    client = speech.SpeechClient()

    with io.open(speech_file, "rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=8000,
        language_code="en-US",
    )

    response = client.recognize(config=config, audio=audio)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u"Transcript: {}".format(result.alternatives[0].transcript))


transcribe_file("speach.wav")

我們可以在最後一行發現我們將本地端的wav錄音檔轉換成文字，
而中間config language_code的部分，
我們可以從
https://cloud.google.com/speech-to-text/docs/languages
尋找支援的語言(像是繁體中文zh-TW)，
而sample_rate_hertz會在第一次執行之後告訴你該錄音檔的頻率為多少，
是可能需要做調整才能正確執行程式。

而如果使用雲端儲存空間google-cloud-storage，官網也有提供範例:

# Imports the Google Cloud client library
from google.cloud import speech
import os

credential_path = "cred.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

audio = speech.RecognitionAudio(uri=gcs_uri)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
)

# zh-TW

# Detects speech in the audio file
response = client.recognize(config=config, audio=audio)

for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

可以發現，主要就是audio = speech.RecognitionAudio()的參數，
由content換成uri。

價格的話，每個月前一個小時免費，之後翻譯一個小時大約45元的台幣。